Database extension structure

ABSTRACT

A digital repository  20  includes data items. A user can add additional functionality or program routines to data item by including as a data item a data processing identifier pointing to at least one data processing routine. Such routines are accessed by a client  22  by sending from a client to the digital repository  20  a request to access a data item, wherein the repository provides an interface description document  32  from the digital repository  20  to the client  22  as the response to the request. The client  22  can then transmit a data processing request from the client to a service  26  identified by the data processing identifier of the interface description document  32,  the data processing request including the identifier from the interface description document.

FIELD OF INVENTION

The invention relates to a method, apparatus and computer programproduct for providing extensions to object behaviour in a databaseenvironment, particularly but not exclusively in a semistructured ortriple oriented data store.

RELATED ART

Digital object repositories are databases. The term is generally appliedto databases which are designed to hold a variety of different mediaobjects, in arbitrary collections, combinations and hierarchies. Theymay thus be contrasted with conventional relational databases whichstore data in pre-defined tables.

Most digital object repositories, such as DSpace, ARKive and arXiv havefixed interfaces for accessing objects. These frameworks do not exposeprogrammatic interfaces to objects in the collections, so they do notallow for additional functionality or arbitrary operations to beassociated with objects or groups of objects.

A development of this approach is provided by Robert Kahn and RobertWilensky, “A Framework for Distributed Digital Object Services,”cnri.dlib/tn95-01, May, 1995, presently available on the internet athttp://www.cnri.reston.va.us/home/cstr/arch/k-w.html. This paper setsout a framework that provides means for extending the interface ofmanaged objects. “Fedora” is an implementation of this method.

Such models allow specialised behaviours to be associated with objectsby the system administrators.

Conventional repositories do not allow extensions to the data model orthe programmatic API that accesses the data model.

SUMMARY OF THE INVENTION

According to the invention, there is provided:

a method of accessing data stored in a digital repository containingdata items, comprising:

sending from a client to the digital repository a request to access adata item, wherein the data item stored in the repository includes anidentifier pointing to an interface routine;

running the interface routine pointed to by the identifier to obtain aninterface description document including at least one identifierpointing to at least one data processing routine;

returning the interface description document from the repository to theclient as the response to the request;

transmitting a data processing request from the client to an identifierof the interface description document;

accessing the data in the repository;

applying the requested data processing method to the accessed data toobtain processed data; and

returning to the client the processed data.

The method allows for processes to be attached to objects in the digitalrepository so that the methods can be invoked by simply requesting thedata item, which returns a description document identifying how themethod can be invoked.

The invention also relates to a digital repository and a computerprogram product.

BRIEF DESCRIPTION OF DRAWINGS

For a better understanding of the invention an embodiment will now bedescribed, purely by way of example, with reference to the accompanyingdrawings, in which:

FIG. 1 illustrates the data structure of data stored in a repositoryaccording to the invention; and

FIGS. 2 to 6 illustrate steps in a method according to a firstembodiment; and

FIG. 7 illustrates a second embodiment of the invention.

DETAILED DESCRIPTION

Referring to FIG. 1, a digital repository 20 according to the exampleincludes data of database 2 stored as nodes 4, each node havingproperties 6 that may point to other nodes, to data, or to routines.

It will be seen that node 10 has one node id property 8 giving the nodeid as 10, a unique number representing the node, two data properties 12storing data, two identifier properties 14 pointing to related nodes,and a routine property 16 pointing to “routine”. This last data item isan identifier pointing to a web service used to create a web serviceAPI. In the present embodiment the web service is a WDSL routine used tocreate a web service API and having as a parameter value within theidentifier at least one further identifier pointing to a data processingroutine as will be described below.

Second node 11 is a child node labelled 11 which has an identifier to alocal routine as property 18, as will be described in more detail below.

While graphs of the form of FIG. 1 conveniently show the tree structureof data, such data may also be represented in other ways more convenientfor storing in a computer.

Thus, for the node 10 at the top left in FIG. 1, the data may berepresented by a table of properties and values: TABLE 1 Property Valuenodeid 10 collection 100000 collection 100010 child node 11 dataliteral_data data identifier_to_data thumbnail thumbnail_identifier

It will be noted that each property has a value. In some cases, thevalue is a literal, in others the value is an identifier to data, and inother cases the value is an address in the form of a Uniform Resourceidentifer, or other type of identifier, identifying the address of theresource, in some cases including parameter values.

In the specific embodiment shown, the identifier chosen is aconventional Universal Resource Locator URL and so for example thethumbnail value is a thumbnail_url. The node id is a unique identifie1rfor the node, containing in this example the value 10.

Nodes are grouped into collections of nodes. Each node may be in one ormore collections of nodes. In the embodiment, nodes are assigned tocollections using a single collection property of the node. Table 1shows that node 10 belongs to a collection containing as a value thenumerical value 100010 which is an arbitrary label of the collection towhich the node belongs. Alternatively, instead of storing the collectionin the individual nodes the collection may be stored as a node includingidentifiers to each node of the collection.

One collection is a root collection to which all nodes belong. Thisinformation need not stored separately for each node. However, forsimplicity, in the preferred embodiment the root collection is simplytreated as a normal collection and each node has an identifier pointingto it. In table 1, the root collection is assigned the numerical value100000 and the node 10 includes an identifier pointing to it, as do allnodes. Some nodes may belong only to the root collection.

The node 10 has two pieces of data associated directly with it, onebeing a piece of literal data and one being an identifier to a dataresource elsewhere.

Table 1 shows the data only for one node. In order to represent thewhole database, the database may be represented by data triples, in theform (node_id, property, value).

Referring to FIG. 2, the digital repository 20, which includes database2, is shown, together with client 22 and web services 24, 26. It will beappreciated that in general there will be many more clients, but onlyone is shown to simplify the drawing.

The digital repository 20, client 22 and web services 24, 26 arenetworked together; the networked links are not shown and any suitablenetworking approach may be used.

Firstly, let us consider how the embodiment allows a routine to beinvoked that does not require arguments. An example is provided inproperty 18 which points to a URL of the routine to execute the routine,a segment of the identifier, here a query string as part of the URL, isan identifier to a local data item 12 on the same node 8.

To invoke the method, all that is required is for a client to requestthe data represented by property 18. Resolution of the URL of property18 will cause the method to be executed.

For example, the local routine may be a routine to provide a thumbnailimage of a larger image on the same node. If a client wishes to accessthe thumbnail image, it simply requests the thumbnail image byrequesting the data attached to the respective identifier. The requestcauses the URL of the identifier to be retrieved, running a routine thattakes the standard image as an input that returns the thumbnail. As faras the client 22 is concerned, this functionality is identical to thecase where the thumbnail has already been prepared and stored in therepository. This thumbnail is accessed in exactly the same way as if thethumbnail was actually present in the digital repository.

As will be appreciated, this simple method is not applicable to the casewhen the client needs to provide a parameter, such as the size of thethumbnail, since the URL is in effect a method call that includes staticarguments.

For this reason, additional functionality is provided to allow theclient to also call routines requiring parameters, for example aparameter indicating the required size of the thumbnail, which can bevaried rather than being predefined.

In order for the client to access enhanced functionality attached tonode 10 the client 22 first sends a query 30 (FIG. 2) in the form of anaccess request to the digital repository 20 requesting to access theproperty 16 of a specific node 10 pointing to the parameter method. Thedigital repository 20 then returns a web service API description such asa WDSL document 32 including identifiers, in the example firstly a URLof the parameter method stored as web service 26, and secondly a URLpointing to the data 12 of this specific node 10 to allow the webservice 26 to access the data of the specific node.

This may be done in a number of different ways. One approach is for theproperty 16 to be a URL of an interface routine getWebAPI(ParameterMethod_url,Data_url). The interface routine getWebAPI preparesa web service description document such as a WDSL document detailing howto access the routine specified by the URLs passed as an argument togetWebAPI. The first argument, ParameterMethod_url contains the URL of adata processing routine which may be invoked by the client. The secondargument, Data_url, is a URL indicating the URL of the data to beprocessed.

The interface routine may take multiple pairs of parameters to allowmultiple URLs to be passed.

Alternatively, calls to such data items including URLs may beintercepted in the digital repository 20 which can itself identify thatthe routine getWebAPI needs to be called. In this case, the property 16may only need to contain an indication of the location of the routine tobe accessed, ParameterMethod_url, and if applicable the data if there ismore than one data item 12 attached to the node that might be therequired data item to be processed.

In general, the web service API description 32 may include details of anumber of different web services 24, 26 all of which may provide themethod, identified by respective data processing URLs, and multiple dataitems all identified by respective data URLs.

The web service desciption document 32 is transmitted (FIG. 3) back tothe client.

The client then selects one of the available web services specified inthe web service description document 32, and calls the chosen webservice 26 (FIG. 4). In the example, this web service 26 is stored on aseparate machine, though it is also possible for the web service 26 tobe on the digital repository 20 itself. The call 34 to the chosen webservice 26 includes the data URL of the required data, taken from theweb service description document.

The web service 26 then uses the data URL of the data from the call 34to obtain the data from the repository 20, as illustrated in FIG. 5which shows the request 36 for the data and the returned data 38. Theweb service 26 then executes the selected method using the returned data38 to obtain processed data and returns the processed data 40 to theclient (FIG. 6).

This functionality described above allows the extension of data storedin a digital repository by processes requiring arguments, since theclient 22 can provide arguments when calling the web service 26 usingcall 34.

For example, returning to the thumbnail idea again, a user can add aprocess for providing a thumbnail to a node by adding a data item 16including a call to getWebAPI with two arguments. The first argument isthe URL PrepareThumbnail_url of a routine for obtaining the thumbnailPrepareThumbnail and the second argument is the URL Image_url of thedata of the image from which the thumbnail is prepared. Thus, the dataitem 16 added to the node 8 may begetWebAPI(PrepareThumbnail_url,Image_url).

When this data item is invoked, the routine getWebAPI returns a webservice description document detailing how to call the routinePrepareThumbnail and including the URL Image_url of the relevant imagedata.

The client can then invoke PrepareThumbnail_url(Image_url,ThumbnailSize) to call the routine PrepareThumbnail which may be locatedat webservice 26. Routine PrepareThumbnail calls the data repositoryrequesting the data of Image_url and receives the image data in return.PrepareThumbnail then prepares a thumbnail of size ThumbnailSize fromthe image data and returns the thumbnail.

It will be appreciated that much more complex routines can be invoked inthe same way, simply by adding a suitable URL to the appropriate dataitem. Users can create services and make them available for specificnodes without needing to reprogram the repository.

Another example is an image library which might maintain a library ofimages of photographic stock. Later a network operator might want tooffer access to this library of images for mobile devices. It has itsown service for transcoding the images so that they are suitable fordisplay on mobile devices. The exact transcoding operation depends onthe mobile device making the request. In addition, the image librarywants to restrict which images are made available in this way. So thenetwork operator selects these subset of the images, puts them in acollection, then adds a method call to the collection that takes someparameters about the target mobile device as parameters, and producestranscoded images as an output.

Both the method of adding functionality including arguments and thein-line calls without arguments allow individual methods to be added tonodes on a node by node basis simply by adding a suitable data item tothe node. Thus, methods can be added by users with the right to addproperties to nodes—there is no need to extend the object's interfacesince this remains unchanged.

In prior approaches, as far as the inventors are aware, it is eitherimpossible to associate processes and routines with data, or it is onlypossible with system administration privileges by programming specificspecialist behaviours for specific objects. In contrast, the approach ofthis embodiment allows the additional process to be added through theusual interface used by users for updating the data stored in thedigital repository 20.

This example presents a web service description document such as a WDSLdocument 32 to the client 22 providing the information to call theroutines. However, the use of WDSL is far from an essential feature andany interface description format may be used. Some alternative formatsinclude are SSDL presently described athttp://www.ssdl.org/overview.html, Really Simple Web ServiceDescriptions presently described athttp://webservices.xml.com/lpt/a/ws/2003/10/14/salz.html and OWL-Spresently described at http://www.daml.org/services/owl-s/1.0/

A repository may optionally support more than one web servicedescription format at the same time, in which case multiple servicedescriptions may exist for the same node. It is also possible that inorder to request a service description, a client may have to supply someinformation i.e. authenticate its identity. This may result in aclient-specific filtering of the service description, i.e. the exactcomposition of the generated interface description depends upon theidentity of the user and/or some characteristic or attribute of theclient system.

Further, although the example uses URLs as identifiers and locators, theskilled person will be familiar with other identifiers and locators thatmay also be used. For example URLs are a specific type of a uniformresource identifier (URI) which have a specific resolution scheme forinterpreting them as locators. Some other possibilities identifierswhich have different resolution schemes include Handles—presentlydescribed at http://www.handle.net/, PURLS—presently described athttp://purl.oclc.org/ and LSIDs—presently described athttp://www.i3c.org/wgr/ta/resources/lsid/docs/LSIDSyntax9-20-02.htm. Inaddition, URIs themselves are a specific type of universal resourcenames (URNs)—presently described at http://www.ietf.org/rfc/rfc2141.txtso any URN that has an associated naming scheme as a resolution schemefor resolving identifiers to locators is potentially usable here.

The embodiment differs from FEDORA in a number of ways, including: i)the APIs can be customised to individual objects, not just object types,and in that ii) any user of the repository, not just the repositoryowner, can customise the APIs. This means that users can createfunctional overlays on networked data in a way that was not possiblebefore, based upon a native, networked interface that is accessible topotentially all users.

In a second embodiment, illustrated in FIG. 7, instead of the methodsbeing added on a node by node basis, collections of nodes are used toreduce the work in adding methods to a number of nodes.

Collection nodes 70, 72 include identifiers 74 to nodes; each nodepointed to by a collection being part of the collection represented bythat node. Thus, in the example of FIG. 7, both data nodes 8 are part ofthe first collection 70, but only one of the data nodes is part of thesecond collection 72. The nodes 8 already described above with referenceto FIG. 1 are still present; they will be referred to as data nodes 8 todistinguish them from collection nodes 70,72.

Instead of attaching identifiers 16, 17 to static web service APIdescriptions to individual nodes, as in the first embodiment, theidentifiers are attached to collection nodes 70, 72. In this way, anumber of nodes may use the same routine simply by adding the node tothe collection. Further, each collection may include a number ofdifferent routines simply by adding appropriate identifiers to thecollection nodes 70,72 to deliver a variety of different web service APIdescriptions.

Alternatively, a single web service API description document returned bya routine 16 pointed to from a collection node may list a number ofdifferent data processing routines for execution by web services; theweb service API description document may therefore in this caseeffectively define an interface comprising many different routines.

Note that data nodes are added to a collection using an identifier fromthe collection node to the data node, and not the other way round. Thisis so that when an object node is first received the methods that takearguments are not visible. Instead, there is a method available on thenode called node.getInterface. When this is called, the digitalrepository 20 identifies all collections that a data node belongs to andhence all methods applicable to the data node 8. A single interfacedescription document 32 is then created describing these methods and howto invoke them.

Since the method node.getInterface is defined for all nodes it need notbe included in the defintion of each node. If no interface documents areavailable for a node, a null or error message is returned by theroutine.

A further benefit of having the identifiers point from the collectionnodes 70,72 to the data nodes 8 is that data node 8 does not theninclude any properties related to the interface, avoiding anyduplication between the interface description document 32 and the dataof the data node.

Note that it is possible to include both identifiers 16 to routines oncollection nodes 70,72, as in the second embodiment and on data nodes 8as in the first embodiment to allow methods to be attached to individualnodes, if required, as well as to groups of nodes in cases where that ismore convenient.

Those skilled in the art will realise that the above embodiments arepurely by way of example and other approaches and details may be used.

For example, although the embodiments above use digital repositories inwhich the data is stored in nodes, the invention does not require thedigital repository to use this model and alternative data storagearrangements may be used.

1. A method of accessing data stored in a digital repository containingdata items, comprising: sending from a client to the digital repositorya request to access a data item, wherein the data item includes a dataprocessing resource identifier pointing to at least one data processingroutine; obtaining an interface description document including at leastone data processing resource identifier pointing to the at least onedata processing routine and at least one data identifier pointing todata; returning the interface description document from the digitalrepository to the client as the response to the request; transmitting adata processing request from the client to a service identified by thedata processing identifier of the interface description document, thedata processing request including at least one data identifier from theinterface description document; accessing the data in the repositoryusing the data identifier; processing the accessed data to obtainprocessed data; and returning to the client the processed data.
 2. Amethod according to claim 1, wherein the step of accessing the dataincludes: calling the repository from the web service with the dataidentifier; and returning the data identified by the data identifierfrom the repository to the web service as returned data; and the step ofprocessing the accessed data includes processing the returned data inthe web service using the method indicated by the identifier.
 3. Amethod according to claim 1 wherein the digital repository includes aplurality of nodes, each node having at least one property, eachproperty having a identifier pointing to data or a method.
 4. A methodaccording to claim 3 wherein at least one node has a property includingdata and a further property of a identifier pointing to a directprocessing method for directly processing a result, the method furthercomprising: sending from a client to the digital repository a request toaccess a data item of a node, wherein the data item stored in therepository includes a identifier pointing to direct processing methodfor processing a result and a identifier to data of the same node;running the interface routine pointed to by the identifier to processthe data pointed to by the identifier using the direct processing methodto obtain processed data; and returning the processed data from therepository to the client as the response to the request; obtaining aninterface description document including at least one data processingidentifier pointing to the at least one data processing routine and atleast one data identifier pointing to data;
 5. A method according toclaim 3 wherein: the digital repository includes at least one collectionnode, the collection node including a identifier to at least one datanode and a identifier to at least one data processing routine, and thestep of obtaining an interface description document includes returningas the data processing identifiers the identifiers of those dataprocessing routines pointed to by those collection nodes including aidentifier to the data node to which the data item relates.
 6. A methodaccording to claim 1 wherein the interface description document is a WebServices Description Language (WDSL) document.
 7. A method according toclaim 1 wherein the digital repository includes a number of collections,a number of routines being associated with the collections; and the stepof obtaining an interface description document includes: identifyingwhich if any collection the requested data item belongs to; returning asthe interface description document an interface description documentlisting each routine associated with each collection to which therequested data item relates that is capable of processing the requesteddata item.
 8. A method of processing in a digital repository containingdata items, comprising: receiving from a client a request to access adata item, wherein the data item stored in the repository includes aidentifier pointing to an interface routine; running the interfaceroutine pointed to by the identifier to obtain an interface descriptiondocument including at least one identifier pointing to at least one dataprocessing routine and a identifier identifying data in the repository;returning the interface description document from the repository to theclient as the response to the request; receiving a request from a dataprocessing routine identified in the interface description document toaccess the data identified by the identifier identifying data; andreturning the identified data in the repository to the data processingroutine to allow the data processing routine to process the data.
 9. Amethod of adding a data processing routine to node of a digitalrepository including a plurality of nodes, each node having at least oneproperty, each property having a identifier pointing to data or amethod, the method including: adding a property to the node, the addedproperty including a identifier of the data processing routine to beadded and a identifier of the data of the node to be processed by theidentifier.
 10. A method of adding a data processing routine to a datanode of a digital repository including a plurality of nodes, wherein thedigital repository includes at least one collection node, the methodcomprising: adding a property to a collection node including theidentifier of the data processing routine; and adding a property to thecollection node to point to the data node to which the data processingroutine is to be added.
 11. A digital repository comprising: a datastore including a plurality of data items; code for receiving from aclient a request to access a data item, wherein the data item stored inthe repository includes a identifier pointing to an interface routine;code for running the interface routine pointed to by the identifier toobtain an interface description document including at least oneidentifier pointing to at least one data processing routine and aidentifier identifying data in the repository; code for returning theinterface description document from the repository to the client as theresponse to the request; code for receiving a request from a dataprocessing routine identified in the interface description document toaccess the data identified by the identifier identifying data; and codefor returning the identified data in the repository to the dataprocessing routine to allow the data processing routine to process thedata.
 12. A digital repository system comprising: a data store storing aplurality of nodes, each node including at least one property, whereinthe property is a uniform resource indicator (identifier); a pluralityof data nodes, each data node containing at least one identifier beingor pointing to data; and at least one collection node, the collectionnode having a property pointing to a data node containing a identifierpointing at a method of getting an interface; so that accessing theproperty of the data node including the identifier pointing at a methodof getting an interface returns an interface description documentdescribing at least one method that may be applied to the data of thedata node and how the at least one method may be invoked.
 13. A computerprogram product on a data carrier, for controlling a digital repository,arranged to cause the digital repository to carry out the steps of:receiving from a client a request to access a data item, wherein thedata item stored in the repository includes a uniform resource indicator(identifier) pointing to an interface routine; running the interfaceroutine pointed to by the identifier to obtain an interface descriptiondocument including at least one data processing identifier pointing toat least one data processing routine and a identifier identifying datain the repository; returning the interface description document from therepository to the client as the response to the request; receiving arequest from a data processing routine identified in the interfacedescription document to access the data identified by the identifieridentifying data; and returning the identified data in the repository tothe data processing routine to allow the data processing routine toprocess the data.
 14. A computer program product according to claim 13,for controlling a digital repository including at least one collectionnode, the collection node including a identifier to at least one datanode and a identifier to at least one data processing routine, thecomputer program product being arranged to return the interfacedescription document by returning as the data processing identifiers theidentifiers of the data processing routines pointed to by the collectionnodes that point to the data node to which the data item relates.